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A  New  Method  of  Edge  Correction  for  Estimating 
the  Nearest  Neighbor  Distribution 


Ernesto  M.  Floresroux 
Michael  L,  Stein 


Abstract 

Analysis  of  data  in  the  form  of  a  set  of  points  irregularly  distributed  within  a  region 
of  space  usually  involves  the  study  of  some  property  of  the  distribution  of  inter- 
event  distances.  One  such  function  is  G,  the  distribution  of  the  distance  from  an 
event  to  its  nearest  neighbor.  In  practice,  point  processes  are  commonly  observed 
through  a  bounded  window,  thus  making  edge  effects  an  important  component  in 
the  estimation  of  G.  Several  estimators  have  been  proposed,  all  dealing  with  the 
edge  effect  problem  in  different  ways.  This  paper  proposes  a  new  alternative  for 
estimating  the  nearest  neighbor  distribution  and  compares  it  to  other  estimators. 
The  result  is  an  estimator  with  relatively  small  mean  squared  error  for  a  wide  variety 
of  stationary  processes. 


1.  Introduction 

Spatial  point  processes  have  been  widely  used  as  models  in  many  scientific  and  technological 
fields  including  ecology  and  biology  (Biggie,  1983),  astronomy  (Neyman  and  Scott,  1958),  and 
archeology  (Donnelly,  1978).  Exhaustive  sampling  of  patterns  is  becoming  more  commonplace 
because  images  can  be  easily  digitized,  and  data  sets  in  such  a  form  are  more  readily  available. 
In  order  to  describe,  analyze,  and  make  inference  on  such  patterns,  properties  of  the  distribution 
of  inter-event  distances  are  usually  studied.  There  is  considerable  work  in  estimating  the  reduced 
second  moment  measure  (Ripley,  1988;  Stein,  1993),  the  empty  space  function  (Baddeley  and  Gill. 
1993),  and  the  nearest  neighbor  distribution  (Biggie,  1983).  This  work  proposes  a  new  way  of 
estimating  the  nearest  neighbor  distribution  function,  G(^),  which  is  defined  as  the  probability 
that  a  baU  of  radius  E  centered  at  an  arbitrary  event  of  the  process,  contains  at  least  one  other 
event:  this  is  equivalent  to  the  probability  that  the  distance  between  an  arbitrary  event  and  its 
nearest  neighbor  is  less  than  or  equal  to  t.  Heuristically,  by  an  arbitrary  event  we  mean  that  we 
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have  a  large  but  finite  number  of  events  in  some  finite  region,  and  one  of  these  events  is  selected 
by  simple  random  sampling.  A  precise  definition  can  be  found  in  Daley  and  Vere-Jones  (1988). 

An  example  of  when  G  is  a  useful  description  of  a  spatial  point  process  is  the  locations 
of  trees  in  a  forest  (Diggle.  1983).  A  simple  model  for  their  locations  would  be  a  homogeneous 
Poisson  process  with  intensity  A,  for  which  G{t)  =  1  —  exp(”“At^)  in  two  dimensions.  However, 
due  to  the  nature  of  the  process  of  seed  dispersal,  there  may  be  more  clumping  of  trees  than  would 
occur  under  the  Poisson  model.  Alternatively,  there  may  be  a  competitive  relationship  between 
trees  that  would  cause  trees  to  be  more  evenly  spaced  than  under  the  Poisson  model.  The  nearest 
neighbor  distribution  provides  one  method  of  distinguishing  between  these  various  possibilities  and 
is  particularly  relevant  when  dependencies  of  the  process  over  short  distances  are  of  interest. 

Since  mapped  patterns  are  usually  observed  through  windows  with  boundaries,  events  close 
to  the  boundary  of  the  observation  region  might  have  their  true  nearest  neighbor  outside  it.  This 
makes  edge  effects  an  important  component  in  the  estimation  of  G,  and  any  reasonable  estimator 
should  account  for  them.  Several  estimators  have  been  proposed  in  the  literature  (Baddeley  and 
GiU,  1993;  Doguwa  and  Upton,  1990;  Hanisch,  1984;  Ripley,  1977),  all  of  which  deal  with  the  edge 
effect  problem  in  a  different  way.  For  observations  near  the  boundary  of  study,  the  ball  of  radius  t 
around  these  events  is  not  entirely  observed,  rendering  incomplete  information  about  the  nearest 
neighbor.  Instead  of  discarding  this  information,  a  simple  method  for  imputing  the  probability  of 
an  event  in  this  ball  conditioning  on  the  data  is  proposed.  It  is  based  on  considering  other  events 
in  the  observed  part  of  the  process  for  which  a  translation  (if  the  process  is  stationary)  and  a 
rotation  (if  the  process  is  stationary  and  isotropic)  make  it  comparable  in  an  appropriate  sense  to 
the  problem  event.  The  result  is  an  estimator  that  has  a  smaller  mean  squared  error  than  the  two 
most  commonly  used  estimators  and  its  bias  is  of  roughly  the  same  order. 

Furthermore,  as  opposed  to  Doguwa  and  Upton’s  estimator,  it  performs  well  in  certain 
large  sample  scenarios  where  edge  effects  are  kept  approximately  constant  as  the  observed  region 
grows.  For  this  purpose,  suppose  $  is  a  stationary  point  process  on  IR^.  If  the  observed  area  is  a 
rectangle,  keeping  one  side  constant  while  letting  the  other  one  increase,  edge  effects  will  be  severe 
even  though  the  total  area  and  the  number  of  observed  points  tend  to  infinity.  A  second  scenario 
where  edge  effects  do  not  diminish  occurs  if  instead  of  samphng  only  one  window,  the  process  is 
observed  through  a  w^hole  set  of  similarly  sized  windows.  Baddeley,  et.  al.  (1993)  give  an  example 
for  a  three-dimensional  spatial  point  process  in  which  the  events  are  locations  of  a  certain  feature 
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in  monkey  skulls  and  multiple  windows  arise  because  the  point  pattern  is  mapped  in  a  number  of 
well  separated  sampling  volumes  from  the  skull. 

To  study  the  performance  of  the  estimators  in  different  scenarios,  three  qualitatively  differ¬ 
ent  point  processes  covering  a  wide  spectrum  of  alternatives  are  used  as  examples.  As  an  example 
of  an  aggregated  process,  a  Xeyman-Scott  cluster  process  (Neyman  &  Scott.  1958)  is  analyzed, 
lor  a  process  exhibiting  some  regularity  or  repulsion,  a  perturbed  grid  is  chosen.  In  such  a  model, 
points  on  a  grid  are  randomly  moved  according  to  a  certain  distribution.  Finally,  since  no  analysis 
is  complete  without  studying  the  behavior  under  what  is  considered  the  canonical  model  for  point 
processes,  a  homogeneous  Poisson  process  is  considered. 

Section  2  defines  some  necessary  notation.  Section  3  reviews  previously  suggested  estima¬ 
tors  for  the  nearest  neighbor  distribution.  Section  4  defines  the  new  estimator.  For  simplicity,  it 
is  only  defined  for  processes  in  IR^.  although  the  basic  idea  behind  it  applies  in  any  number  of 
dimensions.  Section  5  provides  a  heuristic  argument  as  to  why  our  estimator  should  be  preferred  to 
the  one  suggested  by  Doguwa  and  Upton  (1990).  Section  6  describes  the  results  of  the  simulation 
study. 

2.  Notation 

Suppose  $  is  a  simple  stationary  point  process  in  IR^  and  only  a  part  of  it  is  observed 
through  a  bounded  window  W  C  IR^.  As  is  customary  in  the  literature,  $  wiU  denote  both  a 
random  set  in  IR^  and  a  random  measure.  For  any  Borel  set  A  C  IR^,  the  random  variable  ^(A) 
counts  the  number  of  points  of  $  that  fall  in  A,  Let  Vd[A)  represent  the  d— dimensional  volume  of 
the  set  A:  the  subscript  d  will  be  suppressed  if  its  meaning  is  unambiguous  from  the  context.  For 
any  two  points  Fi,P2  ^  the  Euclidean  distance  between  them  is  denoted  by  d(Pi,P2)-  Finally, 
b{x,t)  represents  the  ball  of  radius  t  around  the  point  x. 

The  objective  is  to  estimate  G.  the  nearest  neighbor  distribution  function  for  the  underlying 
point  process  Denote  the  events  in  W  by  Pi,  •  •  -  .Pn.  so  that  $(Lr)  =  n.  For  every  event  Pf,  bi 
denotes  its  distance  to  the  nearest  border  of  W  and  w;  the  distance  to  its  nearest  neighbor  inside 
lU.  That  is,  Wi  =  minj^t  {d(P^,  Pj)}. 

For  all  observed  events  P^,  Doguwa  and  Upton  ( 1990)  analyze  the  six  possible  orderings  of 
the  distance  t  at  which  we  are  estimating  G',  the  distance  b{  to  the  nearest  border,  and  the  distance 
ivi  to  the  nearest  neighbor.  For  five  cases,  it  is  known  with  all  certainty  whether  the  nearest 
neighbor  is  within  t  from  the  point  of  interest.  This  is  not  true  in  the  case  where  bi  <  t  <  Wi 
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(hereafter,  this  situation  will  be  referred  to  as  the  maybe  case),  since  the  part  of  the  circle  that  is 
not  observed  may  or  may  not  contain  an  event  of  the  process.  For  ail  cases,  including  this  last  one, 
for  large  enough  the  balls  of  radius  t  around  most  events  intersect  the  border,  and  thus  are  not 
completely  observed. 

Following  Doguwa  and  Upton's  (1990)  notation,  define  the  function  f{u.  v)  for  li.  r  G  IR  as 
f{u.  c  .i  =  /{u  <  f},  where  I  is  an  indicator  function.  For  the  observed  window  \V\  denotes 
the  window  W  eroded  by  a  ball  of  radius  i.  which  can  be  written  as  W'^t  =  {x  G  \V\b{x.t)  C  14^}. 
Thus.  is  the  set  of  points  in  W  that  are  more  than  a  distance  t  away  from  the  border. 


3.  Existing  Estimators 

3.1  Reduced  Sample  Estimator 


The  most  widely  used  estimator,  usually  referred  to  as  the  reduced  sample  estimator,  was 
proposed  by  Biggie  (1979)  following  Ripley  (1977).  For  those  events  that  are  more  than  a  distance 
t  away  from  the  boundary  (aU  events  Pi  such  that  t  <  bp,  that  is.  Pi  G  the  whole  circle 

of  radius  t  around  them  is  observed.  Thus,  with  all  certainty  it  can  be  said  whether  the  nearest 
neighbor  is  within  t.  By  selecting  those  events  Pi  G  W  for  which  this  condition  is  met.  that  is, 
by  allowing  a  guard  area  around  the  border,  the  sample  gets  restricted  to  events  that  satisfy  only 
three  cases  out  of  the  six  possible  orderings  of  the  ranks  of  b{,  Wi,  and  t.  As  t  grows,  the  size  of  the 
available  sample  decreases,  and  for  large  enough  t,  it  will  be  zero.  Formally,  this  estimator  can  be 
written  as 


Gi(t) 


Er=i  f{t.b^)f{wij) 
EUfit.b,) 


This  estimator  of  G  counts  the  number  of  events  in  the  eroded  window  W^t  for  which  the  nearest 
neighbor  is  observed  within  and  divides  it  by  the  total  number  of  events  in  Wc^t.  .Although 
this  method  is  intuitively  appealing,  it  has  certain  drawbacks.  Its  range  of  estimation  is  quite 
limited:  for  example,  in  the  case  of  a  rectangle  of  sides  Sx  and  Sy,  the  largest  value  of  t  for  which 
the  estimation  is  possible  is  t  <  min{sa;,  Sy}/2.  This  estimator  sometimes  renders  a  nonmonotone 
empirical  distribution  function,  and  for  large  t,  the  reduced  sample  estimator  might  be  undefined. 


3.2  Hanisch’s  Estimator 


Hanisch  (1984)  proposed  an  alternative  estimator  for  G,  Instead  of  restricting  the  study  to 
those  events  that  lie  in  Wc^t,  attention  is  focused  on  all  objects  for  which  the  distance  to  the  nearest 
neighbor  is  known.  This  is  the  subset  of  events  G  $  fi  W  for  which  Wi  <  bi.  This  estimator 
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restricts  itself  to  a  different  subset  of  events  than  the  reduced  sample  estimator.  It  can  be  written 
as 

'£.7=1 

^  2 1  ^  j  yl  r  /  A  \  /  \ 

f{wi.bOp{wi) 

A  weight  function  p  is  added  to  account  for  the  area  within  W  in  which  any  pair  of  events  separated 
by  a  distance  t  could  lie.  In  general.  p( r  i  =  Ij v{W-:^z)^  which  renders  p(z)  =  (5:^  ~  2-2:)”^(Sy  —  2z)“^ 
tor  a  rectangle  of  sides  5^  and  5^.  The  purpose  of  this  function  is  to  make  G2  a  ratio-unbiased 
estimator  of  G:  that  is,  the  ratio  of  expectations  of  the  numerator  and  denominator  is  equal  to  the 
true  nearest  neighbor  distribution.  Such  a  property  is  often  encountered  in  estimators  of  this  kind. 

Stoyan.  KendaU  &  Mecke  (1987.  p.  128)  proposed  another  version  of  this  estimator  that 
does  not  take  into  account  the  weight  function.  We  made  a  comparative  analysis  of  these  two 
versions  and  we  found  no  reason  for  not  using  p,  except  for  a  minor  increase  in  computational 
effort,  and  ratio-unbiasedness  is  lost  by  doing  so. 

The  estimator  G2  is  well  defined  if  at  least  one  event  in  the  study  region  has  its  nearest 
neighbor  closer  to  it  than  the  border.  Conditional  on  the  observed  process,  the  denominator  is 
constant,  and  thus  the  estimator  is  nondecreasing. 

3.3  Doguwa  and  Upton’s  Estimator 

Doguwa  and  Upton  (1990)  proposed  another  estimator  that  avoids  some  of  the  problems 
encountered  by  Gi  and  6^2*  It  is  weU  defined  unless  ^(W)  —  0,  and  it  is  monotone  in  t.  Most 
importantly,  instead  of  just  considering  those  events  Pi  for  which  only  certain  orderings  of  the 
distances  and  Wi  are  satisfied,  it  accounts  for  aU  the  events  simultaneously.  For  an  event 

Pi  for  which  bi  <  t  <  Wi  (the  maybe  case),  an  ad  hoc  method  of  imputation  is  proposed.  Figure 
1  illustrates  the  problem  solved  by  this  method.  It  works  weU  in  certain  cases,  but  it  makes  an 
inappropriate  correction  when  the  process  is  badly  non-Poisson. 

To  construct  such  an  estimator,  the  number  of  events  Pi  G  $  fi  fU  for  which  Wi  <  t  is 
counted.  This  is  known  with  aU  certainty  in  five  out  of  the  six  ordering  of  t,  bi^  and  Wi,  For 
the  maybe  case,  only  part  of  the  baU  of  radius  t  around  such  an  event  is  observed  in  W.  and 
no  other  events  of  the  process  lie  in  it.  Denote  by  Ip^{t)  =  b{Pi,t)  n  W  the  part  of  b(Pi,t) 
that  is  inside  W,  and  by  Op^{t)  =  6(P,.  t)  n  the  part  outside  W.  The  objective  is  to  estimate 
H(t:P^)  =  p|$(Op,(^))  >  0|Pi,  $(/py;n)  =  ij,  the  conditional  probability  of  finding  at  least  one 
event  in  the  unobserved  part  of  the  baU  of  radius  t  around  the  event  Pi  given  that  there  are  no 
other  events  in  the  observed  part. 
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If  the  process  is  assumed  to  be  homogeneous  Poisson,  this  imputation  is  trivial  since  the 
numbers  of  events  of  the  process  in  disjoint  regions  are  independent.  Let  uq  =  uifr).  and  let 
Vi  =  and  hi  =  be  the  areas  of  the  observed  and  unobserved  parts  of  b{Pi^t) 

respectively;  clearly,  i';  -r  h,  =  in  two  dimensions.  For  a  Poisson  process  with  intensity  A, 
H{t;Pi)  =  1  -  exp(“Aftj).  Estimating  A  by  the  standard  estimator  X  =  n/vo  yields  H(t:Pi)  = 
1  -  exp|-A/2.|. 

This  probability  has  to  be  estimated  for  every  event  Pi  for  which  bi  <  t  <  Wi.  Notice  that 
t  is  considered  fixed,  and  the  set  of  events  for  which  H{t;Pi)  has  to  be  estimated  is  different  for 
different  values  of  t.  For  both  small  and  large  values  of  t,  the  number  of  events  in  the  maybe  case 
is  zero. 

^From  this,  Doguwa  and  Upton  (1990)  recommend 


^  E  ^  E  { 1  1  Pi). 

j=i  ^  1=1 

This  result  is  independent  of  whatever  is  observed  in  which  makes  estimating  G  by  Gz 

easy  because  only  areas  of  parts  of  circles  have  to  be  calculated.  However,  in  certain  cases  it  can 
introduce  serious  bias  problems  if  the  process  is  sufficiently  non-Poisson. 

3.4  Other  Estimators 


Another  estimator  for  G  recently  proposed  by  Baddeley  and  Gill  (1993)  treats  edge  effects 
as  a  censoring  problem.  Based  on  the  analogy  with  censored  survival  data,  the  distance  from 
an  event  to  its  nearest  neighbor  is  taken  to  be  right-censored  by  its  distance  to  the  border.  By 
extending  the  Kaplan-Meier  estimator  and  computing  a  cumulative  hazard  function,  they  construct 
a  ratio-unbiased  estimator  that  has  the  following  expression: 


5(0= i-n(i 

T  ^ 


Er=i  ly^i  =  r}f(wi,bi)\ 
y.'t=i  fV,Wi)f{r,bi)  )' 


where  r  in  the  product  ranges  over  those  elements  of  {tui, . . . ,  u;„}  that  are  at  most  t.  Baddeley 
and  Gill  (1993)  show  that  this  estimator  has  advantages  over  the  reduced  sample  estimator  Gi-  A 
direct  comparison  with  the  estimator  suggested  here  remains  to  be  done. 
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The  estimators  Gi  and  G2  can  be  combined  rendering  another  way  of  estimating  G.  This 
can  be  done  by  considering 


G{t)  — 


ELi  {  /( b!  )f{wi,  t)  +  /( Wi,  t)f{wi,  bi)piwi)^ 


E?=i  {  /(^  +  f(  u:i,bi)p{wi) 


This  combination  yields  an  estimator  that  performs  better  than  Gi  and  G2  both  in  terms  of  bias 
and  mean  squared  error.  This  is  due  mainly  to  the  fact  that  each  estimator  uses  different  sets  of 
the  orderings  of  t,  bi,  and  Wi.  Though  trivial,  this  seems  to  have  been  overlooked  in  the  literature. 


4.  A  New  Estimator 

Based  on  Doguwa  and  Upton's  idea,  this  section  presents  another  estimator  for  G.  The 
imputation  problem  is  approached  with  fewer  and  less  restrictive  assumptions.  As  before,  the  goal 
is  to  estimate  H(t;  Pi),  the  probability  of  finding  at  least  one  event  in  the  unobserved  part  of  b{Pi,  t) 
if  b{Pi,t)  ^  W,  For  simplicity,  we  describe  the  estimator  for  processes  on  IR^,  although  the  same 
approach  can  be  used  in  any  number  of  dimensions. 

For  this  purpose,  the  translation  Au  of  a  set  A  G  IR^  for  u  G  IR^  will  be  defined  as  Au  = 
A+u  =  {y+u\y  G  A}.  The  rotation  of  a  set  A  G  IR^  by  an  angle  0  is  defined  as  R^A  =  {Rey\y  €  A}, 
where  Ro  is  the  orthonormal  matrix  that  rotates  points  in  IR^  counterclockwise  by  an  angle  6, 

For  #  a  stationary  simple  point  process,  t  >  0  fixed,  G  $  H  W,  and  G  $  H  WQt’>  Pj  will 
be  said  to  be  analogous  to  Pi  if  ^{b{PjA)  nW  H  =  1.  In  other  words,  Pj  is  analogous  to 

Pi  if  the  only  event  of  the  translated  process  that  is  in  the  part  of  b{Pj,t)  intersecting  is 

Pj  itself.  This  is  illustrated  in  Figure  2(c). 

If  $  is  also  assumed  to  be  isotropic,  then  the  definition  of  analogous  points  can  be  extended. 
For  0  G  (0,2“],  an  event  Pj  is  analogous  at  9  to  P{  if  ${6(Pj, i)  fi  VU  H  RoWp-p^}  =  1.  Then, 
analogous  as  defined  for  stationary  processes  can  be  stated  as  being  analogous  at  0.  This  is 
illustrated  in  Figure  2(d). 

For  Pi  ,  Pj  G  $  n  VU,  where  $  is  a  stationary  simple  point  process,  define 

T  (P  )  =  I  ^  analogous  to  Pi 

*  I  0  otherwise. 

Lastly,  for  Pi  G  ^  H  fU  and  P^  G  ^  H  Wqi,  where  $  is  a  stationary  and  isotropic  simple 
point  process,  define  ^{(Pj)  as  the  total  angle  of  rotation  that  makes  Pj  analogous  to  Pi.  This  can 
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be  written  as  Oi{Pj)  ==  I{  analogous  at  ffjdO.  This  implies  that  if  ^(b(Fj,t))  =  1.  that  is,  the 
nearest  neighbor  of  Fj  is  not  within  t.  then  ffi(Fj)  =  27r.  The  angles  of  rotation  at  which  Fj  is 
analogous  to  Fi  do  not  necessarily  form  a  simple  arc. 


To  construct  the  estimator,  we  will  first  assume  that  the  underlying  process  is  simple 
and  stationary.  As  before.  H{t[F^)  denotes  the  probability  of  finding  at  least  one  event  in  the 
unobserved  part  of  b{FiA).  Instead  of  acting  as  if  the  process  were  Poisson,  this  probability  can  be 
estimated  by 


H.{t]F,) 


Tliis  estimator  looks  for  the  events  in  the  observed  realization  of  the  process  $  that  are 
analogous  to  assesses  for  such  events  whether  an  event  of  the  process  lies  in  the  part  of  the 
circle  corresponding  to  the  unobserved  part  of  b(Fi,t)^  and  computes  the  ratio  of  these  two  counts. 
Although  H{UFi)  might  be  undefined  if  there  are  no  analogous  points  to  P^,  if  at  least  one  such 
point  is  observed,  this  conditional  probability  can  be  estimated.  If  there  are  no  analogous  points 
we  will  take  H^(t;Fi)  =  0.  The  new  estimator  then  takes  the  same  form  as  Doguwa  and  Upton’s 
replacing  H{t]Fi)  with  jff*(f;P{). 


To  estimate  H{t;  Fi)  this  way  the  only  assumption  required  was  stationarity  of  the  process. 
If  it  is  believed  that  the  process  is  also  isotropic,  a  given  realization  of  $  might  provide  even  more 
information.  An  event  Fj  might  not  be  analogous  to  Fi  in  the  first  sense  defined  above,  but  there 
might  exist  a  rotation  of  the  process  that  makes  it  analogous  to  P{.  To  see  that  this  can  be  the 
case,  consider  P2  in  Figure  2(a).  Figure  2(c)  illustrates  why  Fo  is  not  analogous  to  Pi.  However, 
if  the  orientation  is  changed,  it  can  be  seen  that  there  is  at  least  one  rotation  of  $  that  makes  it 
analogous  to  Pi.  This  rotation  is  depicted  in  Figure  2(d). 


Apparently,  assuming  isotropy  increases  the  number  of  analogous  situations  dramatically. 
Nevertheless,  analogous  points  must  be  counted  in  a  different  way  than  before.  In  this  case,  it 
seems  reasonable  to  define 


where  0,{Fj)  is  the  angle  of  rotation  of  Ip-  around  Fj  as  defined  above.  This  means  that  if  Fj  is 
analogous  to  Fi  and  its  nearest  neighbor  is  not  within  t  of  it,  0i{Fj)  =  27r  and  0i{Fj)f{wj.t)  =  0, 
whereas  if  it  is,  the  angle  of  rotation  is  added  to  both  the  numerator  and  the  denominator  of 
H.M:F;), 
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Since  computing  angles  can  become  quite  a  cumbersome  task,  this  situation  can  be  approx¬ 
imated  by  doing  the  following.  For  every  event  Pj,  the  translated  part  of  the  circle  (/p, (t))  can 
be  rotated  a  finite  number  of  times  m.  That  is.  for  every  point  Pj,  m  different  values  for  Ti{Pj) 

are  obtained.  Denote  them  by  T,{P,).  (or  k  -  1 _ m.  The  counting  argument  for  calculating 

H,^it;P,)  still  applies,  and  it  can  be  approximated  by 

If  )/(■»,.<) 

;From  simulation  studies  it  was  observed  that  m  need  not  be  large.  The  difference  between  m  =  1 
and  m  =  4,  though  nonnegligible.  is  small.  Between  m  =  4  and  m  =  8,  this  difference  is  hardly 
noticeable.  Choosing  m  =  4  is  convenient  for  rectangular  W. 

5.  Discussion 

The  basic  problem  with  estimating  H{t;Pi)  as  suggested  by  Doguwa  and  Upton  (1990), 
by  acting  as  if  the  process  were  Poisson,  is  that  it  the  resulting  estimator  of  H{t;P{)  can  be 
substantially  biased  if  the  process  is  not  Poisson.  Our  procedure  is  designed  to  avoid  this  problem. 
More  specifically,  by  letting  the  observation  window  W  grow  in  an  appropriate  way,  H^{t;Pi) 
converges  to  H{t:  Pi),  whereas  H(t;  Pi)  converges  to  1  -  exp(^-Xv[b{Pi,t)  fl  ,  which  does  not 
in  general  equal  H{t\Pi).  However,  it  is  awkward  to  give  a  precise  meaning  to  these  statements 
since  the  definition  of  H{t]Pi)  depends  on  W,  Consider  the  following  related  problem,  which  does 
have  a  clear  interpretation.  For  a  fixed  set  A  containing  0,  we  want  to  estimate 

r^[A)^P{T-[^,A)\m,A)) 


where 


T{x,  A)  =  {$({a:})  =  ^{A^  H  b{x,t))  =  l}  and 
r*(a:,  /i)  =  r(r.  A)  n  {^{b{x,  t))  >  l}. 

Roughly  speaking,  we  want  to  think  of  rji  A)  as  H(t]  P,),  with  0  as  Pi,  6(0,  ^)nA  as  Ip. ,  and  6(0,  t)r\A^ 
as  Op^,  Doguwa  and  Upton  are  essentially  estimating  rj{A)  by  1  —  exp(^-Xv{b{0,t)  D  A*^)^,  while 
we  are  using  the  empirical  estimator  of  the  conditional  probability  given  by 

I{r{P„A)}/  HT{Pi,A)}. 

P,€Wet  Pi^We, 
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Then,  assuming  the  region  W  grows  in  an  appropriate  sense  and  $  is  ergodic  (Daley  and  Vere- 
Jones,  1988,  p.  335),  our  estimator  converges  almost  surely  to  r]{A),  whereas  Doguwa  and  Upton's 
converges  almost  surely  to  the  generally  incorrect  1  -  exp^-A  v{b{0,t)  H  A)^,  This  suggests  that, 
unlike  Doguwa  and  Upton's  approach,  the  proposed  imputation  procedure  is  a  sensible  way  of  cal¬ 
culating  the  conditional  probability  Hit:  Pi) ^  whether  or  not  the  underlying  process  is  Poisson,  The 
simulations  in  the  next  section  show  that  the  bias  in  Doguwa  and  Upton's  approach  to  estimating 
H(t:P^)  can  lead  to  substantial  bias  in  estimating  G  in  situations  where  our  approach  does  not 
have  this  problem. 

6.  Simulation  Results 

This  section  presents  a  comparative  simulation  study  of  the  estimators  (S\-,  i  = 
for  the  nearest  neighbor  distribution  in  order  to  evaluate  their  performance  under  different  types 
of  stationary  and  isotropic  point  processes.  The  estimator  will  be  taken  as  the  non-rotating 
version  of  the  estimator.  Simulations  considering  the  rotating  version  were  run,  yielding  only  a 
slight  improvement  over  the  non-rotating  one,  so  we  report  only  the  results  obtained  from  the 
non-rotating  version.  All  the  code  was  written  in  C  following  ANSI  standards,  and  the  simulations 
were  run  on  a  Sparc  2-Sparc  10  computer  system. 

Simulations  were  run  in  order  to  study  the  behavior  of  the  estimators  in  a  specific  large 
sample  scenario.  Though  several  large  sample  scenarios  can  be  explored  in  the  theory  of  point 
processes,  since  the  estimators  described  above  try  to  account  for  edge  effects,  it  is  of  value  to  see 
how  they  perform  if  the  sample  size  grows  and  edge  effects  remain  roughly  constant.  That  is,  it 
is  desirable  to  maintain  the  proportion  of  events  close  to  the  border  constant  and  large.  If  the 
region  W  grows,  for  example,  by  letting  both  sides  of  a  rectangle  grow  or  by  letting  the  radius 
of  a  circular  region  go  to  infinity,  edge  effects  become  negligible.  In  such  cases,  all  the  estimators 
described  above  perform  weU  in  terms  of  bias  and  variance,  and  the  differences  between  them  are 
small. 

In  order  to  maintain  constant  edge  effects,  for  the  simulations  we  wiU  consider  processes 
defined  throughout  IR^  that  are  observed  through  several  nonintersecting  windows,  and  these 
windows  are  assumed  to  be  sufficiently  apart  so  that  the  observed  realizations  of  the  processes  in 
each  one  of  them  are  essentially  independent.  Simulations  were  run  for  one  through  five  observation 
windows. 

For  every  process,  several  thousand  replications  (N)  were  run,  and  the  different  estimators 
were  calculated  for  each  one.  The  value  of  the  z— th  estimator  of  G  in  the  j—th  realization  is 
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denoted  by  G\.  By  a  replication  we  mean  that  the  process  was  simulated  in  the  relevant  number  of 

windows  m  (m  =  1, _ 5)  and  G{t)  and  H{i\Pi)  were  calculated  taking  n  =  YlJLi  which 

is  the  total  number  of  observed  events  in  the  m  windows. 

For  every  process  considered,  several  summary  statistics  were  calculated:  the  estimated 
bias,  the  average  squared  error,  and  the  difference  of  the  absolute  errors  *.  Since  the  estimated 
bias  is  usually  small,  the  average  squared  error  curves  are  in  most  cases  a  good  approximation 
to  the  variance  of  the  estimators.  The  difference  of  the  absolute  errors  makes  small  differences 
between  the  estimators  more  noticeable  than  when  the  average  squared  error  is  used.  DXEij{t) 
will  be  positive  for  those  values  of  t  for  which  Gi  is  performing,  on  average,  better  than  and 
negative  otherwise. 

6.1  Parent-Offspring  Process 

Poisson  cluster  processes,  first  suggested  by  Neyman  and  Scott  (1958)  as  a  possible  way 
of  describing  cosmological  data,  incorporate  an  explicit  form  of  spatial  clustering.  A  specific 
Xeyman-Scott  process  was  considered,  where  the  parent  events  form  a  Poisson  process  with  in¬ 
tensity  p  =  10,  and  the  distribution  of  offspring  Pm  is  given  by  M  =  (2,3,4)  with  probabilities 
Pm  =  (0.5,0.25.0.25).  The  offspring  are  then  placed  around  their  parent  following  a  bivariate  nor¬ 
mal  distribution  with  covariance  matrix  with  a  =  0.05.  For  simulation  purposes,  it  is  worth 
pointing  out  that  parents  outside  the  observation  region  can  produce  offspring  inside  W,  so  the 
process  has  to  be  simulated  on  a  region  larger  than  W. 

Figure  3  shows  the  estimated  bias  for  Gi  i  =  1 . 4  for  N  =25.000  replications  of  the 

cluster  process  simulated  on  unit  square  windows.  The  expected  number  of  events  per  unit  area  is 
27.5.  The  bias  curves  of  aU  four  estimators  foUow  a  somewhat  similar  pattern.  G3  has  the  largest 
bias,  and  in  a  small  range  of  distances  64  is  the  only  one  with  a  positive  bias.  The  average  squared 
error  curves  (not  shown)  are  all  of  about  the  same  order,  though  G4  always  shows  a  smaller  average 
squared  error  than  G\  and  G2.  Except  for  large  distances,  this  is  also  true  for  G3  when  compared 
to  Gi  and  G2.  Gi  has  the  largest  ASE.  The  average  difference  of  the  absolute  error  curves  enlarge 
these  differences,  showing  that  G4  performs  uniformly  better  than  G\  and  G2.  For  small  distances, 

*  For  a  fixed  value  of  t,  the  computing  formulas  used  were; 

Estimated  bias:  EB  =  G(t)}. 

Average  squared  error:  ASE,(f)  = 

Difference  of  the  absolute  errors:  DAE,-,j(t)  =  y  ^k=i  “  G(t)\  -  \Gj{t)  -  (S(f)|}. 
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Gz  performs  somewhat  better  than  G4.  and  the  trend  gets  reversed  as  the  distance  t  increases.  This 
can  be  appreciated  in  Figure  4,  where  the  performance  of  the  estimators  in  the  multiple  window 
scenario  (2.  3.  4,  and  5  unit  square  windows)  is  shown.  The  bias  of  Gi,  G2,  and  G4  decreases  in 
every  case,  whereas  G'3  has  a  nontrivial  bias  that  does  not  diminish,  regardless  of  the  number  of 
windows,  DAEi,4  ^.nd  DAE2.4  decrease,  due  mainly  to  the  increase  in  the  sample  size  and  that 
both  G’l  and  are  ratio  unbiased.  On  the  other  hand,  DAE3.4  shows  the  opposite  trend,  since 
its  negative  part  gets  closer  to  zero,  and  the  range  of  positive  differences  gets  larger  as  the  number 
of  windows  increases,  which  means  the  relative  performance  of  G4  improves  with  respect  to  that 
of  G3.  For  five  windows,  except  at  very  short  distances,  DAE3,4  is  positive,  suggesting  that  as  the 
number  of  windows  increases.  G4  becomes  a  uniformly  better  estimator  than  G3. 

6.2  A  Regular  Process 

Several  models  for  processes  that  exhibit  some  regularity  have  been  proposed  in  the  litera¬ 
ture.  In  order  to  investigate  the  behavior  of  the  estimators  in  this  case,  a  perturbed  grid  was  chosen. 
First,  a  grid  where  the  events  are  separated  by  a  distance  d  in  each  direction  is  constructed.  This 
grid  is  randomly  placed  on  the  plane  after  rotating  it  by  an  angle  6  chosen  uniformly  in  the  interval 
(0. 2;r].  Then  every  event  is  moved  randomly  from  its  position  —  hence  perturbed  —  independently 
from  the  other  events.  The  distance  each  event  moves  follows  a  bivariate  normal  distribution  with 
covariance  matrix  centered  at  their  unperturbed  position.  The  parameters  of  the  grid  consid¬ 
ered  for  this  section  were  d  =  0.15  and  a  =  0.02.  The  final  pattern  is  definitely  regular,  but  it  is 
not  apparent,  given  the  expected  number  of  events  per  observation  window  (1/0.15^  ^  44),  that 
the  original  process  was  an  evenly  spaced  grid.  Given  the  time  required  to  run  this  process  on  the 
multiple  window  scenario,  for  one  and  two  windows,  N  =25,000  replications  were  run:  for  three, 
y  =15.000,  and  for  four  and  five,  N  =  7500. 

Figure  5  shows  the  bias  of  G3  and  G4  for  the  multiple  window  scenario.  The  shaded 
regions  represent  approximate  95%  confidence  intervals  for  the  bias  at  distance  t,  G4  appears  to 
be  unbiased,  except  at  the  the  range  0.14  <  t  <  0.17,  where  there  exists  a  small  positive  bias  that 
seems  to  decrease  as  the  number  of  windows  increases.  On  the  other  hand,  G3  has  a  nontrivial 
bias  that  does  not  change  with  the  number  of  windows.  The  estimators  Gi  and  G2  (bias  curves 
not  shown)  are  for  aU  practical  purposes  unbiased.  Figure  6  shows  the  average  squared  error  for 
the  four  estimators.  It  can  be  appreciated  that  overall,  G4  has  the  smallest  average  squared  error 
of  the  four  estimators.  The  behavior  of  G3  is  due  mainly  to  the  estimation  of  a  nontrivial  positive 
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value  for  H  for  events  in  the  maybe  case:  the  true  value  of  this  probability  in  this  case  is  very  close 
to  zero. 

6.3  Poisson  Processes 

In  the  case  of  the  Poisson  process,  it  will  not  be  surprising  to  see  that  for  almost  every 
r.  Gs  performs  better  than  the  other  three  estimators.  A  Poisson  process  with  intensity  A  =  20 
was  simulated  on  the  unit  square.  For  each  number  of  windows  (one  through  five),  .V  =25,000 
replications  were  run.  The  patterns  the  estimated  bias  and  average  squared  error  curves  show  do 
not  change  qualitatively  with  the  number  of  windows.  Hence,  Figure  7  shows  these  two  curves  for 
ony  one  window.  G\  consistently  has  a  larger  bias  than  G2  a^nd  G3.  It  is  noticeable  that  G2  has 
overall  the  smallest  bias.  G4  has  a  small  bias,  but  it  is  the  only  estimator  that  shows  a  positive 
bias  in  some  range.  Gi  has  the  largest  average  squared  error,  which  becomes  indistiguishable  from 
that  of  G2  as  the  number  of  windows  increases.  As  expected,  G3  has  overall  the  smallest  average 
squared  error:  that  of  G4  lies  between  ASE3  and  ASE2. 

Figure  8  shows  the  average  of  the  difference  of  absolute  error  curves.  For  aU  t.  DAE14 
and  DAE24  are  positive.  Hence,  this  criterion  indicates  that  on  average  G4  performs  better  than 
Gi  and  G2.  On  the  other  hand,  DAE3.4  is  negative  for  almost  every  t  (except  when  t  is  large), 
indicating  that  G3  does  a  better  job  than  G4  estimating  the  nearest  neighbor  distribution  if  the 
underlying  process  is  Poisson. 

7.  Conclusion 

The  three  types  of  processes  studied  in  the  previous  sections  w'ere  chosen  in  order  to  analyze 
and  compare  the  behavior  of  four  estimators  of  the  nearest  neighbor  distribution  under  diverse 
alternatives.  If  the  process  is  Poisson  or  very  nearly  Poisson,  G3  wiU  be  the  best  estimator  because 
of  the  way  the  maybe  case  is  handled:  the  unknown  probability  is  imputed  with  the  approximate 
corresponding  Poisson  probability,  which  in  these  cases  will  be  very  close  to  the  truth.  On  the  other 
hand,  by  imputing  using  a  more  empirical  approach,  G4  does  nearly  as  well  in  these  circumstances. 
That  is.  G4  does  not  depend  on  a  Poisson  assumption  for  its  edge  correction  to  make  sense,  and 
that  is  the  reason  that  in  non-Poisson  processes,  where  G3  sometimes  performs  badly,  G4  performs 
well.  In  terms  of  mean  squared  error,  in  all  three  processes  considered  G4  performs  uniformly 
better  than  G’l  and  ^2?  ^'iid  the  bias  of  the  three  is  of  the  same  order.  Thus,  overall,  G4  appears 
to  be  a  better  estimator  than  Gi,  G2,  and  G3. 
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Figure  1.  P  is  an  event  for  which  biPA)  is  not  fully  contained  in  W.  Ipit)  and  Op(t)  (shaded 
area)  represent  the  parts  of  biPA)  that  are  inside  and  outside  IT  respectively.  Given  that  Ip{t) 
contains  only  one  event  of  the  process  (P  itself),  the  objective  is  to  estimate  the  probability  of 
there  oeing  at  least  one  event  in  Op{t). 

Figure  2,  Denote  by  Hn  the  part  of  b(PiA)  that  lies  outside  the  window  W.  and  Va  the  part  of 
it  that  lies  inside.  Since  in  (a)  Va  is  empty,  Pi  is  in  the  maybe  case  as  described  in  the  text.  In 
(b).  it  can  be  seen  that  P4  is  analogous  to  Pi  because  only  contains  P4,  whereas  in  (c)  it  can 
be  observed  that  P2  is  not  because  P3  is  also  in  Uc.  In  (d),  by  rotating  b{P2,t)  by  approximately 
157^  counterclockwise,  Po  becomes  analogous  to  Pi.  In  fact,  there  exists  a  continuum  of  angles  for 
which  this  is  case. 

Figure  3*  This  figure  shows  the  estimated  bias  of  the  four  estimators  when  the  Neyman-Scott 
process  described  in  the  text  is  observed  through  multiple  windows.  For  each  number  of  windows, 
N  =  25.000  replications  were  run.  Gi,  G2,  and  G4  are  plotted  using  the  same  vertical  scale.  Since 
the  bias  of  G3  is  larger  than  the  other  biases  by  one  order  of  magnitude,  it  is  plotted  separately. 

Figure  4.  These  plots  show  the  average  difference  of  the  absolute  errors  for  Gi,  G2,  and  G3  when 
compared  to  G4  when  the  Neyman-Scott  process  described  in  the  text  is  observed  through  two, 
three,  four,  and  five  independent  unit  square  windows. 

Figure  5.  These  figures  show  the  average  bias  curves  of  G3  and  G4  in  the  multiple  window  scenario 
for  the  perturbed  grid.  The  shaded  areas  represent  95%  confidence  intervals  for  the  average  bias. 
Notice  that  the  vertical  scales  are  not  the  same. 

Figure  6,  These  plots  depict  the  average  squared  error  curves  of  the  four  estimators  when  the 
perturbed  grid  is  observed  through  multiple  windows  (one  through  five).  All  estimators  have  their 
ASE's  in  the  same  range. 

Figure  7.  For  one  observation  unit  square  window,  the  bias  and  average  squared  error  of  the  four 
estimators  when  the  process  is  Poisson  are  shown. 

Figure  8.  These  plots  show  the  average  difference  of  the  absolute  errors  obtained  by  comparing 
Gi,  G:,  and  G3  to  G4  for  simulations  on  one  through  five  windows  when  the  process  is  Poisson. 
AU  three  plots  have  the  same  vertical  scale. 
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